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Abstract. A central theme in quantum mechanics is nonlocality which states that a pair of quantum systems 
which are shown not to be physically interacting may nevertheless be impossible to describe as independent 
entities. Quantum theory, however, forbids nonlocal correlations beyond a certain limit. Why is nature only so 
nonlocal and not more? Approaching the question from the direction of statistics and statistical inference, we 
identify a statistical no-signaling principle which states that no information may pass through a disconnected 
channel. We show this principle to be equivalent to the Tsirelson bound on nonlocality for the Bell-CHSH 
inequality. 


Some of the predictions made by quantum mechanics appear to be at odds with common sense. Yet 
quantum mechanics remains the most precisely tested and successful quantitative theory of nature. It is 
therefore believed that even if quantum mechanics is someday replaced, any successor will have to inherit 
at least some of its “preposterous” but highly predictive principles. Perhaps the most counter-intuitive 
quantum mechanical principle is nonlocality dlO: 

Nonlocality: A pair of quantum systems which are shown not to be physically interacting may never¬ 
theless be impossible to describe as independent entities. 

The mystery of nonlocality is not only to understand why nature is as nonlocal as it is as, but also to 
understand why nature is not more nonlocal than it is. There are alternative Non-Signaling theories which 
permit nonlocality beyond the quantum limit ill ; why doesn’t nature choose these theories over quantum 
mechanics? Several explanations have been proposed, but none is tight, i.e. none provides a necessary 
and sufficient condition for the quantum limit [4, L5Q. We exhibit a protocol (an infinite oblivious transfer) 
which uses ‘superquantum NS-boxes’ to send messages through a disconnected channel, and we propose 
a principle which we call statistical no-signaling which states that such a communication is physically 
impossible. We show that statistical no-signaling in a bipartite setting is equivalent to Tsirelson’s bound 
for the CHSH inequality which we henceforth call the quantum bound on nonlocality. We thus provide 
a conceptual explanation for this bound. Our approach is different from others in that we use statistical 
techniques as opposed to probabilistic techniques— in particular we use Fisher information. 

A famous application of nonlocality is to construct an 1-2 oblivious transfer protocol between two distant 
agents (A)lice and (B)ob. Alice and Bob each possess a mysterious box representing one half of the quantum 
system to be explained. Alice’s box might, for example, contain one half of a singlet state of spin-1 
particles, with Bob’s box containing the other half [1,16]. In addition, Alice possesses a pair of bits x 0 and 
x i, each of which is a zero or a one. Using boolean algebra and her boxes (the protocol will be described 
later), Alice encodes her pair of bits into a single bit xf L) which she sends across a classical channel to Bob. 
Bob wishes to recover either x 0 or x i, but Alice doesn’t know in advance which one. Bob uses the received 
bit A 1 ), his box, and some boolean algebra to construct an estimate y, for his desired bit Xi. See Figured] 
later on. 

What is the probability that Bob correctly estimates the bit he wished to know? He has two possible 
sources of knowledge— the bit a; (1) he received from Alice, and some mysterious ‘nonlocal’ correlation 
between his box and Alice’s. The strength of such a nonlocal coordination between two systems is encap¬ 
sulated by a number c e [—1,1] called the Bell-CHSH correlation such that Bob’s probability of guessing 
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correctly is (1 + |c|)/2 (see Supplementary lA|). The Bell-CHSH inequality tells us that \c\ < 1/2 classi¬ 
cally [1, 6]. Mathematically, the statement of nonlocality is that c may violate the Bell-CHSH inequality. 
This has been supported by increasingly supported by experiments, culminating in a recent loophole-free 
verification J3], We think of the Bell-CHSH correlation c as a measure of the strength of the nonlocality 
manifest in our boxes. 

How large can c be? Tsirelson’s bound tells us that |c| cannot exceed l/\/2 in a world described by 
quantum mechanics |@]. This quantum bound on nonlocality: 


( 1 ) 



has been tested experimentally, with the current state of the art being an experiment by Kurtsiefer’s group 
which has achieved a value of c which is only 0.0008 ± 0.00082 distant from Tsirelson’s bound |5|. Such 
experimental evidence supports that Tsirelson’s bound indeed holds in the real world. 

Tsirelson’s result is a specifically quantum mechanical fact for which there has been no good conceptual 
explanation. How fundamental is ([I])? Must this inequality also hold for any future theory which might 
someday supercede quantum mechanics [90? We are led to the following question: 


Question: Can we find a plausible physical principle, independent of quantum mechanics, which is 
necessary and sufficient to guarantee that |c| < 1/^2? 

The search for such a principle has a history of about 20 years. It was initially expected that the physical 
principle of relativistic causality (no-signaling) itself restricts the strength of nonlocality US, Hi,: 12]. But 
then it was discovered that no-signaling theories ma 
independent formalism of No-Signaling (NS)-boxes 


exist for which |c| > l/\/2. This led to the device 
13] (see also |j3]). In particular, maximum violation 


of the Bell-CHSH inequality is achieved by Popescu-Rohrlich (PR)-boxes which are consistent with Rela¬ 
tivistic Causality. Why then, after all, does nature not permit (QQ) to be violated (as far as we know)? Several 
suggestions have been made. Superquantum correlations lead to violations of the Heisenberg uncertainty 
principle [14, 15[], which is another seemingly purely quantum result. PR-boxes would allow distributed 
computation to be performed with only one bit of communication [ 160, which looks unlikely but doesn’t 
violate any known physical law. Similarly, in stronger-than-quantum nonlocal theories some computations 
exceed reasonable performance limits 11170 . The principle of information causality 1118 1 shows that no sen¬ 
sible measure of mutual information exists between pairs of systems in superquantum nonlocal theories. 
Finally, it was shown that superquantum nonlocality does not permit classical physics to emerge in the limit 
of infinitely many microscopic systems [19, 20], Of these, only information causality and macroscopic lo¬ 
cality give necessary conditions for the quantum bound and neither is known to be sufficient [ 4 ]. Thus these 
conditions do not single out quantum mechanics from amongst all possible nonlocal theories, as pointed 
out in li2lh . 

We propose the following statement as a physical principle: 


Statistical no-signaling: No information can pass through a channel whose output is independent of 
its input. 

In this report we formulate a consequence of statistical no-signaling that is equivalent to (QQ), providing 
a sought-for conceptual explanation for the quantum bound on nonlocality. The novelty of our approach is 
our use of statistical methods. 

Let x = Bernoulli^) be a Bernoulli random variable held by Alice, which serves as our information 
source. We imagine 6 £ [—1,1] as encoding a message, perhaps in the digits of its binary expansion. Alice 

def 

independently samples m values A = {x 0 , xi ,..., .x m _ i} from x (the interesting case is rn —» oo) which 

def 

she sends through a channel to Bob. Bob receives a set of values B = {y 0 , yi, • • •, y m - 1 } which are also 
independent identically distributed (iid) and which we may consider as realizations of a Bernoulli random 
variable y whose mean is their sample average. We have thus described a noisy channel with input x and 
with output y. In SupplementariesfAlandfBlwe construct such channels and show that c may be viewed as 
the correlation between their inputs and outputs. 
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The Fisher information X B (0) represents the maximum information about 9 that Bob may have received 
by way of the above protocol. If X B (0) = oo then Bob knows 9 ‘on the nose’, while if X B (0) = 0 then Bob 
has received no information about 9 at all. 


Consequence of statistical no-signaling: Given the above setting, if x and y are independent random 
variables then Xq(9) < oo. 


When the number of samples is finite, the Fisher information for a disconnected channel obviously van¬ 
ishes. But when there are infinitely many samples it may happen that X B (0) = oo. Indeed, we will construct 
a disconnected channel for which X B (0) = oo using superquantum NS-boxes. It is in this way that statisti¬ 
cal no-signalling will imply that superquantum NS-boxes are non-physical and therefore that the quantum 
bound on nonlocality is indeed fundamental. 

As we shall see, the only three possible values of X B (9) in the m —» oo limit are 0, 1, and oo. When the 
Fisher information equals zero or one, no information is transmitted about 9. The distinction between these 
cases is discussed in Supplementary |D| 


To derive the quantum limit on nonlocality 


nected channel in a specific way as a limiting case of the van Dam protocol [ 1 
that was used to test information causality [18]. 


< 1/\/2 from statistical no-signaling, we realize a discon- 

This is the same protocol 


def 

Alice samples 2 n bits A = \xy h x\ ,... ,X 2 n -i} from her ±l-valued Bernoulli^) random variable x 
which she converts into 0/1-valued bits, {x Q , x \,..., such that x t = (—l) Xi+1 . She then combines 

these using her NS-boxes, a pair at a time, into one ‘very special’ bit x (n> which she transmits to Bob 
through what we will for now assume is a perfect channel. Bob randomly chooses an index 0 < i < 
2 n — 1 (Alice does not know in advance which i he will choose), and makes his best guess y, (respectively, 

y i = f (—ip +1 ) for Alice’s bit x^ (respectively, x t ) using x 1 ' 11 ' 1 and his NS-boxes. The correlations between 
Alice’s boxes and Bob’s boxes are governed by the Bell-CHSH correlation c G [—1,1]. The process 
described above is called random access coding or oblivious transfer, and it defines a channel from x to y 
(see Supplementary |B] and Figured]). Assume first that |c| < 1. A short calculation in Supplementaries |B] 
and O will reveal the following properties in the n -x oo limit: 


• Random variables x and y are independent. 

• As for the Fisher information: 


( 2 ) 


X-b(9) = lim 


(2c 


,2\ n 


n—too 1 — C 2n 9 2 


oo, 2c 2 > 1 ( signaling) 

1, 2c 2 = 1 (randomness) 
0, 2c 2 < 1 ( no-signaling) 


Statistical no-signaling rules out the first case, from which we deduce that 2c 2 < 1, that is the quantum 
limit on nonlocality (HI). 

The conceptual explanation for why the channel becomes disconnected as n —» oo is that the only 
information which passes from Alice to Bob in the van Dam protocol is x' W which is a communication 
bottleneck. Alice’s information about 9 is contained in her samples x 0 , xi ,..., x 2 » _ 1 which are combined 
with one another and with random noise from the boxes to become x^ n \ Conversely, Bob’s estimates 
y 0 ,yi,..., 7/2"-i are also all recovered from x (n) together with noise introduced by his boxes. But x (n> 
contains less and less information about 9 as n grows to infinity and as more boxes are used, and x (n> 
contains no information at all about 9 in the n —> oo limit. This disconnects the channel from x to y. See 
Figure 13 

The | c| = 1 case (PR-boxes) requires special consideration. The nonlocal correlation c is independent of 
the characteristics of the classical channel, so we choose the correlation of the classical channel from Alice 
to Bob in the case of 2 n samples to be (c') n for some 1 /\/2 < d < 1. This disconnects the classical channel 
between x to y, as we show in Supplementary O while maintaining X B (0) = oo. This contradicts statistical 
no-signaling as required. 

Our approach differs from others in that we use Fisher information as opposed to Shannon information. 
As a result, Alice’s bits were interpreted as samples of a random variable whose mean encodes a message. 
The utility of Fisher information as a measure of the quantity of Bob’s information about 9 stems from the 
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y=x^)®B 



y=x^)®B Q ®B k ®B l 


FIGURE 1. Distributed oblivious transfer (van Dam) protocol fT3l . Its basic building block is on 
the left, where Alice inserts xo © x\ into her box, receives A, and sends xo © A to Bob. Bob decides 
that he wants to know the value of xj, and he feeds j into his box, which outputs B. Bob’s estimate of 
Xi is then x W © B. When there are multiple boxes, Alice concatenates (the process is called wiring). 
For example, with seven boxes, Alice begins with a collection of bits a?o, x \,..., £7, and she inputs 
X 2 j © X 2 j+i into box j, where j = 0,1, 2,3, receiving Ag, Ai, A2, A3 correspondingly. The bits fed 

into the next level of boxes become x^p = f X 2 j © Aj with j = 0,1, 2, 3. The final output x ® is sent 
to Bob. Bob encodes the address of the bit he wants as the binary number isi 2 i \— for example, if he 
wants X 2 , then he sets is = 0, %2 — 1, and i\ — 0 because 10 is 2 in binary. This binary encoding 
describes a path in his binary tree from a root to a branch, where 0 means ‘go left’ and 1 means ‘go 

def 

right’. Bob inserts is into the lowermost box to obtain Bq. Setting k = 5 — (1 — is), he then inserts 

dof 

Z2 into box k to obtain Bp.. Finally, setting l = k — (3 — if — (1 — *2), Bob inserts i\ into box I to 
obtain Bi. His final estimate for is y % = x^’ © B^ ® Bp. © B/. Further details are given in 
Supplementary |B] 


Cramer-Rao Lower Bound which asserts that o\ = f 1/Xq{6) is the lowest uncertainty about the value of 6 
in terms of error variance that Bob could hope to achieve with an unbiased estimator. 

The Central Limit Theorem (CLT) provides a further interpretation of statistical no-signaling. Let 9 be 
Bob’s best decoder of 9 based on B, that is the maximum likelihood estimator. In Supplementary [0 it is 
shown that as n approaches infinity: 


<3) . 

where d stands in for convergence in distribution. The rightmost term denotes a random variable whose 
distribution is Gaussian centered at 0 with variance 1. We may think of © as a form of the CLT in which 
the number of samples has been replaced by the Fisher information. Explicitly, for c = 1 and for 9 = 0 
we recover the usual CLT. Thinking of the Fisher information as the effective number of samples that Bob 
receives, if 2c 2 < 1 then © appears as a retarded or degenerate CLT in which the effective number of 
samples does not increase as n — » 00 . Thus Bob’s ability to estimate 9 using 9 decreases in the n —> 00 
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FIGURE 2. The statistical no-signaling condition. The van Dam protocol defines an underlying 
channel which becomes disconnected in the n oo limit. The upper illustration shows this chan¬ 
nel and the amount of Fisher information about 0 at its input and at its output. When the number 
of nonlocal resources increases unboundedly, the two ends of the channel become disconnected as 
illustrated by a vanishing bottleneck in the lower figure. Statistical no-signaling dictates that in this 
case no information can pass through, which occurs if and only if 2c 2 < 1. The case of 2c 2 > 1 
leads to a physically unreasonable limit where Bob can fully read off the value of Alice’s 6 through a 
disconnected channel. 


limit, which is what we would expect because less information about 9 is passing through the channel. 
Despite the number of samples growing, the effective number of samples does not increase. 


Conclusions 

We have formulated a statistical no-signaling principle which dictates that no information can pass 
through a disconnected channel. Applied to an infinite limit of the van Dam protocol, this principle is 
equivalent to the quantum bound on nonlocality. We may view this fact as an example of asymptotic theory 
in statistics, in which an asymptotic limit allows us to discern statistical properties that are unavailable for 
a finite number of samples. 

Statistical no-signaling is different from the notion of no-signaling in the sense of non-signaling theories 
(NS-boxes). No-signaling pertains to a single pair of boxes, whereas statistical no-signaling is used as a 
condition on the limit of an iterative construction involving infinitely many boxes. Taking statistical no¬ 
signaling instead of what is traditionally called ‘no-signaling’ as our no-signaling condition, we recover the 
idea that quantum mechanics is indeed the most general nonlocal non-signaling theory. 

Shimony dd and Aharonov id independently suggested that a quantum theory may perhaps be 
based on two axioms, nonlocality and relativistic causality (no-signaling). Aharonov (unpublished) also 
observed that these two axioms, which seem to contradict one another, can be reconciled using uncertainty. 
This idea was virtually abandoned for many years following the discovery of superquantum theories which 
satisfy both axioms. But statistical no-signaling reveals a sense in which this original idea holds true. 
When the number of nonlocal resources increases to infinity, stronger-than-quantum nonlocal theories fail 
to satisfy statistical no-signaling. These superquantum theories approach a signaling limit where Bob can 
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recover Alice’s message with complete certainty even though the channel between Alice and Bob is dis¬ 
connected. Quantum nonlocality obeys statistical no-signaling and thus permits only bounded uncertainty 
(pure randomness), cr| —>■ 1, or complete uncertainty, —>■ oo, in the limit. 

The statistical no-signaling condition is stronger than previously identified principle of information causal¬ 
ity JH- Violation of statistical no-signaling implies violation of information causality whereas the converse 
implication is false. This is evident in the derivation of information causality in that paper, where the ex¬ 
pression of the Fisher information in © in the 6 = 0 case appears as Equation (23) therein. 
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Supplementary A. The bipartite Bell experiment as a noisy symmetric channel 

In this section we recall the definition of the Bell-CHSH correlation c and we formulate the Bell-CHSH 
inequality, establishing notation. We then exhibit c as the correlation of a symmetric binary channel. 


A.l. The Bell-CHSH inequality. Let us recall the classical bipartite Bell experiment. 

Alice and Bob each hold one half of an EPR pair such as a singlet state of spin-1 particles. They 
each possess two different measuring instruments which we unimaginatively call ‘instrument zero’ and 
‘instrument one’. Alice measures her particle using one of the instruments, and Bob does the same. Let 
a be the index of the instrument used by Alice and let A be its reading. Similarly, let b and B denote the 
index of an instrument chosen by Bob and its reading. In the language of probability, A and B are ±1- 
valued-valued Bernoulli random variables. The choices of measuring instrument, a and b, may be either 
parameters or 0/1-valued Bernoulli random variables. 

Repeating the experiment for many different EPR pairs, Alice and Bob may compute the correlations of 


their readings A and B for any given pair of indices a and b. Formally, they compute, E 


AB 



, the 


expectation of AB conditioned on their choice of a particular pair of measuring instruments a and b. We 
now define the Bell-CHSH correlation c by the formula: 


(4) 


4 c = E 


AB 

0,0 

+ E 

Ab 

0,1 

+ E 

AB 

1,0 

- E 

AB 

1,1 


In any theory in which both Alice and Bob’s choices, and the readings of their measuring devices, are 
local, the Bell-CHSH inequality [@] holds: 


(5) 



Locality means that Alice’s readings may only be affected by her own choices (or perhaps by any other 
hidden variables locally at her site), and similarly for Bob’s readings. Quantum mechanically, Alice and 
Bob may violate © and hence Quantum Mechanics is nonlocal. 


A.2. The Bell-CHSH correlation c as a channel correlation. Non-signaling (NS)-boxes provide an ab¬ 
straction and an extension of the Bell-CHSH experiment. This time, Alice and Bob each owns a box. Each 
such box may be thought of as a complete laboratory containing two measuring devices. Either participants 
inserts their choice of measuring device into their box. The box output is the respective reading of the 
chosen measuring device. 

Alice and Bob share a pair of NS-boxes whose inputs are a and b and whose outputs are Bernoulli random 
variables A a and B b . Assume now that o, b, A a , and B b are all 0/1-valued. 

We will show that the Bell-CHSH parameter © represents the correlation of a symmetric binary channel 

dcf — 

whose input is the Bernoulli random variable x = ab and whose output is the Bernoulli random variable 
y = A a © B b , where / denotes (—1)1. 

Let i G {0,1}. Define channel correlations Ci as follows: 


( 6 ) 


def 7T> 

Ci = E 


xy 


X = l 


= p y = i 


x = M -P y 


X = * = 2P y = * 


x = i ) — 1 . 


With respect to a particular choice of measuring devices a and b, © becomes: 


(7) 


c-(a,b) = E 


A n © B b ab 


a, b,ab = i 


— “2B ( A n © B b — ab 


a , b,ab = i 


Pulling the condition i = ab = (— l) ab out of © and using A a © B b = A a B b , we obtain: 


(8) 

c 5,M) = E 

A a BbCib 

a, b 

= (-1 ) ab E 

A a B b 

a, b 
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Assume that the channel is symmetric, i.e. that c = c~ b (a, b ), Va, b. From © and ® we may rewrite 
the Bell-CHSH correlation (HI) as: 


(9) c — — (ci(0,0) + ci(0,1) + ci(l,0) + c_i(l, 1)) 




E 


A a B b 


0,0 


E 


A a B b 


0,1 


E 


A a B b 


1,0 


- E 


A a B b 


1,1 


= 2 P(A a © B b = ab | a, b) — 1 . 


The last equality above follows from the channel symmetry: 


(10) c = 2P ( A a © B b = ab 


o, b, ab = 0) — 1 = 


2 P ( A n © B b — ab 


a, b, ab = 1) — 1 = 2 P ( A a © B b = ab 


a, bj — 1 . 


Equation © is our promised interpretation of the Bell-CHSH correlation as a correlation of a noisy 
symmetric binary channel. 


Supplementary B. The van Dam protocol as a noisy symmetric channel 

In this section we recall the construction of the van-Dam protocol libL i3. We then reinterpret this 
protocol as underlying a noisy symmetric binary channel, as a special case of the construction of SectionlAl 
We compute its correlations, and establish the effect of noise on its classical component. 


B.l. The van Dam protocol. The van Dam protocol realizes an oblivious transfer protocol by means of a 
classical channel and a number of NS-boxes. Each of Alice’s boxes has a corresponding box on Bob’s side, 
and different pairs of boxes are statistically independent. Suppose that Alice has in her possession the bits 
xq, ..., x m _i where m = 2", n > 1. Bob wishes to know the value of one of her bits. He may do so by 
specifying the address of the bit whose value he wishes to know via its binary address i = l n _iz,_ 2 • • • ,io- 
For example, if n = 2 then Bob may specify which of the bits x 0 to x 3 he wants by specifying a binary 
address, 00, 01, 10, or 11. Alice bits and Bob addresses are encoded into the inputs of 2 n — 1 NS-boxes 
following a particular protocol which is described next. 

Alice uses outputs of boxes and choices of measuring devices to determine choices of measuring devices 
for other boxes. Such a procedure is called wiring. The wiring of boxes on Alice side admits a recursive 
description which we now give. Let A^ ,J denote the output of the yth box on the kth level on Alice side. Let 
also: 

(11) f k,j (qi, qf) = qi © A k f eq2 . 

Suppose that Alice wishes to encode m — 4 bits with her boxes. To do so, she first picks two boxes and 
computes: 

(12) 4 1} = f 1 ’ 1 (x 0 , Xi ) , 4 1} = f 1 ’ 2 (a?2, x 3 ) . 


This forms the first level in her construction. The second level then follows: 

(13) x {2) = f’ 1 (xP, x ( 2 ^ . 

In this example there are only two levels and so x is the bit which Alice transmits to Bob through the 
classical channel. In case where m — 2 n there will be n levels and thus x (n> is the bit Bob will receive from 
Alice. 

Unbeknownst to Alice, Bob now decides which bit x t he would like to know the value of. He takes its 
binary address i = i n -\U -2 • • • ,io, and inserts 4-i into all of his boxes whose counterparts are on the k 
level on Alice’s side. He then uses the values B^ that he obtains, together with the bit x (n) he received 
from Alice, to construct the decoding function: 


m = x<"> 


B 


io 


B: 


2,32 

h 


r>n,J n 

in— 1 


(14) 
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The values j x ,, j n (which boxes Bob uses) are determined by the binary address i = i n - X i n - 2 ■ ■ ■ , /'o via 
the recursive formula j/_i = 2 ji — 1 + ii- X for / = 1, 2,... n — 1 starting from j n = 1. 

The probability that Bob will decode the correct value of the bit he desires is governed by the NS-box 
correlation c. For the simplest case of m = 2 where Alice and Bob share a single pair of boxes, note that 


(15) 2 P[yi = x 


Xi 


-1 = 2 P 


Xi 


(y M (A),2b) © Bl’ 1 = Xi 

= 2P © A X0&X1 

As x tl = x 0 © i x (x o © x x ), this equals: 


- 1 

1,1 


B t ; =x h 


x. 


- 1 


(16) 2 P [^Xq © A X ’^ X1 © B 1 ^ 1 — xq © i\(xQ © x x ) Xi^j 


- 1 


= 2 PA 


l 1 ’ 1 


4’ 1 = ab 


a = xq © xi, b = i x , Xi^j — 1 = c , 


which follows from ©. 

In general, decoding any bit out of 2 n possible bits involves using n pairs of NS boxes. Noting that an 
even number of errors, A © B ^ ab, will always cancel out in such a construction, leads to [18]: 

(17) c n = 2 P ( i/i = Xi Xi^j - 1 . 

We illustrate in the case that n = 2: 

(18) P ^ A ai © Bb 1 © A a2 © Bf) 2 = CLibi © (22^2 a l,2t ^1,2) CLlbl © (22^2^ = 

P (yA ai © B hl = a x bi < 2 i, b^j P (^A a2 © B b2 = a 2 b 2 a 2 , b^j + 


P ( A a i © B bl ^ a x bi a Xl b x ) P ( A a2 © B b2 ^ a 2 b 2 


^ 2 ; b 2 ) — 


|(1 + c) ■ 1(1 + c) + 1(1 - c) • 1(1 - c) = 1(1 + c 2 ) . 


B.2. van Dam protocol as a symmetric channel. Assume now that instead of a string of bits, Alice 
has in her possession an information source that is a ±l-valued Bernoulli random variable x whose mean 
is 6. Alice generates m iid samples, :r 0; ..., x m - X from x and converts them into her 0/1-valued bits, 
x 0 ,x x ,... ,x m - X by mapping 0 to —1 and 1 to 1. As in (fl8l) . the van Dam protocol has a memoriless 
property: 


(19) 



^0 5 ^1 5 • • 




From this it follows that if Alice’s inputs x 0 , x x ,..., x m - X are iid then Bob’s outputs y 0 ,y x ,..., y m -i are 

def 

also iid. Therefore the set y t = (—l) Vi determine a Bernoulli random variable y. In this way, the van Dam 
protocol may be viewed as a symmetric binary channel whose input is x and whose output is y. By (H71) 
the channel correlation is 


( 20 ) 


E [xy | x = Xi] = 2 P 




1 = 2 P [yi = Xi 



B.3. Noisy classical channel in the van Dam protocol. The preceding discussion of the van Dam protocol 
assumed a perfect classical channel between Alice and Bob. We now relax this assumption. Let ( c') n 
be the correlation underlying the classical channel, where \c'\ < 1. Such a channel can be realized by 
concatenating n copies of a noisy symmetric channel whose correlation is d. This correlation depends on 
n, and Alice may construct it as part of the protocol based on her knowledge of n. 
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Note first that: 


(21) P(z = i 


x = i = 


P z = i 


y = l)P[y = l 

P (z = i 


y^i)P(y^i 


X = l = 


X = l 


P z^i 


y = i)P[y^i 


x = ij + P(z = i 

y = i)p{y = i 

Let y and z be the input and output of a symmetric classical channel. By © we may write: 
(22) (cT = E[yz} = 2P (z = i y = ij - 1 , 

and similarly we may rewrite (l20l) as: 


x = i 


(23) 


c n = P[xy] = 2P ( y = i 


x = i I — 1 


Substituting (l22l) and (l23l) into (l2Tb gives us that for the van Dam protocol with a noisy classical channel: 


(24) 


P (z = i x = ij = [1 + (cd) n \ /2 . 


From this we see that (cc') n = E[xz] is the correlation of the symmetric binary channel defined by the 
van Dam protocol in the case of a classical channel with correlation ( c') n and a Bell-CHSH correlation c. 


Supplementary C. The van Dam channel disconnects in the n —>■ cxd limit 
If | c\ < 1 or \c'\ < 1 then it follows that: 

(25) E[xz] = 2P — i x = i'j — 1 = ( cd) n o . 

Therefore, in the n —» oo limit: 

(26) P( y z = i 
But also: 

(27) P(z = i) = p(z = i x = i] P(x = i) + P(z = i x^i) P(x ^ i) = 

1 


= 0 = 1/2 


2 (P(x = i) + P(x = - 


Combining (l26l) with (l27l) gives us that: 

(28) P (z 

Thus x and z are statistically independent in the n —>■ oo limit. 


n—>• OO / \ 

x | —> P(z) 


Supplementary D. Transmission of Fisher information through binary channels 

In this section we compute the Fisher information of samples of a Bernoulli random variable sent through 
a binary channel about the mean of the information source. Applying this to the symmetric binary channel 
that underlies the van Dam protocol, we obtain © in the main text. We also discuss the case that the 
quantum limit on nonlocality is attained, i.e. that |cc' I = l/\/2. 
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D.l. Fisher information for a binary channel. Consider a binary channel whose input is a ± 1-valued 
Bernoulli random variable x and whose output is another ±l-valued Bernoulli random variable y. The 
channel correlations are defined by means of ©. If the channel is symmetric then 


(29) 


P(y = x ) = P (y = —1 


x = -l )=P y = l 


X = 1 


from which it follows that c_i = c\ = E[xy]. 

We shall assume a prior distribution for x given by: 


( x = -1 = ^(1 + 0 ) , 


(30) P [ x = -1 

with parameter 9 e [—1,1]. Using this we may write 

0!) 

' x = -l) P (x = -1 


^ y = -i 


9 =P y = -l 


0) +P(y = -1 

1 
2 


x = 1 UP x = 1 


0 = 


1 + -(c_i - Cl) + -(c_i + C l)# 


def 

Alice sends m iid random samples X = {xi,..., x m } through the channel. Denote the set of respective 


def 

outputs y = (yi,..., y m }. The likelihood of 6 given the set y is given by the expression: 


(32) 


p y 


e = 


e 


E m i 

i =i 


p y = ! 


J p ( v y = -f 

where the indicator random variable of a random event A is given as: 
(33) 

According to (l32l) the log-likelihood is given by the expression: 


e 


E m i 

i= 1 1 {yz = 1 > 


def f 1, A occurred; 
A '0, otherwise. 


(34) £(0) = f logP ( 3^ 


6 


2=1 


log? y = -l 


0 + 


2=1 


log P y = 1 


Xy{6) ^ E 


'(dm) 2 ' 

= -E 

'o 2 m' 

\ dO ) 


do 2 


The Fisher information about 6 contained in the set y is defined as: 

(35) 

Note that: 

(36) E 
Using this, (l35l) reads: 

MO) = 


l{yi=s) 

2=1 


J2E[l {yi=s} ] = mP{y = s\ 9), s =-1,1 


2=1 


(37) 


m [i(c i + ci)]‘ 


1 — [|(1 + 0)c i — |(1 — 0)ci] 

For a symmetric binary channel, with c = c_i = c\, Equation ff37l) simplifies to: 

^2 


(38) 


MO) = 


me 


1 - c 2 9 2 

Note that the minimum of 1 y (9) is obtained for 9 = 0 in which case P(x | 9) = 1/2 and Zy( 0) = me 2 . 
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D.2. Fisher information in the van Dam protocol. Alice begins with m — 2 n iid samples. As shown 
in Section E the van Dam protocol defines a channel from Alice to Bob whose correlation is (cc') n where 
c is the Bell-CHSH correlation and d is the correlation of the classical channel used by Alice in her con¬ 
struction. The channel input and output are the random variables x and z. The maximum amount of Fisher 
information that Bob may receive about 9 is attained after he has asked for all of Alice bits, i.e. after re¬ 
peating the protocol 2" times where in each run Bob inputs the indices of his newly requested bit into his 
boxes. Let B be the set of outputs on Bob’s end. According to (l38l) : 


(39) X B {0) 

This simplifies to © when \d\ = 1. 


[2(ccQ 2 f 
1 - ( cd) 2n 9 2 


D.3. Interpretation of the case in which the quantum limit on nonlocality is attained. Fisher infor¬ 
mation about 9 is a function of 9. We see in (l39l) that this function is identically 1 for 2(cc') 2 = 1 in the 
n —> oo limit. One is also the minimum amount of Fisher information contained in a single bit on Alice 
end, namely, when n = 0 and 9 = 0. When the quantum limit on nonlocality is attained, Bob could receive 
the same amount of information about 9 as he could attain by the van Dam protocol by just tossing a fair 
coin. We thus consider this 1b (9) = 1 case to be an instance of no-signaling in which Bob’s decoded bits 
carry no information about the actual value of 9. 


Supplementary E. Statistical no-signaling and the Central Limit Theorem 


Bob may estimate the quantity c n 9 from his decoded iid samples by computing the sample mean: 

(40) M = ^Y.yi, 


i =0 


which, by the strong law of large numbers, is unbiased and converges almost surely to c n 9. This is also a 
maximum likelihood estimator as its variance attains the Cramer-Rao bound. In particular, 

Var (y) 1 — c 2n 9 2 


Var (c n 9) = c 2n Var (9) = 


(41) 

v 7 v / v / 2 n 2 n 

The Central Limit Theorem governs the convergence of c n 9 to c n 9 as n —> oo: 


Var (9) = 1 fI B (0) 


(42) 


Var (y) 

where d means convergence in distribution. Thus: 


(. c n 9 - c n 9 ) -A i/ ~ A/"(0,1) , 


(2c 2 ) r 


1 - c 2n 9 2 


(9 - 9) 


(43) 


v ~ AT( 0,1) . 









